Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
ثبت نشده
چکیده
2gkx0 + v max, where g is the universal constant of gravity, k = 0.0305 is the coefficient of friction between the ball and the ground, and vmax is the maximum velocity allowed at the border of the hole in order to make the ball to enter the hole and not to overcome it. Assuming that the ball has a diameter of 4.5cm and that the hole has a diameter of 7.5cm, vmax is equal to 110.7cm/s. At the beginning of each trial the ball is placed at random, between 2000cm and 0cm far from the hole. The initial velocity is limited within the interval [0; 500]cm/s. When the ball enters the hole the episode ends with reward 0. If v0 > v 0 , the ball is lost and the episode ends with reward −10. Finally, if v0 < v 0 the episode goes on and the agent can try another hit with reward −1. The state variable x is discretized into ten, 200cm wide intervals. Such discretization has been chosen so that, for each interval, there is only one value of velocity that makes the ball to enter the hole, independently from the actual position in the interval.
منابع مشابه
Reinforcement Learning in Continuous Action Spaces through Sequential Monte Carlo Methods
Learning in real-world domains often requires to deal with continuous state and action spaces. Although many solutions have been proposed to apply Reinforcement Learning algorithms to continuous state problems, the same techniques can be hardly extended to continuous action spaces, where, besides the computation of a good approximation of the value function, a fast method for the identification...
متن کاملReinforcement Learning In Real-Time Strategy Games
We consider the problem of effective and automated decisionmaking in modern real-time strategy (RTS) games through the use of reinforcement learning techniques. RTS games constitute environments with large, high-dimensional and continuous state and action spaces with temporally-extended actions. To operate under such environments we propose Exlos, a stable, model-based MonteCarlo method. Contra...
متن کاملOn-Policy vs. Off-Policy Updates for Deep Reinforcement Learning
Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...
متن کاملMonte Carlo POMDPs
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a sa...
متن کاملDecision making with inference and learning methods
In this work we consider probabilistic approaches to sequential decision making. The ultimate goal is to provide methods by which decision making problems can be attacked by approaches and algorithms originally built for probabilistic inference. This in turn allows us to directly apply a wide variety of popular, practical algorithms to these tasks. In Chapter 1 we provide an overview of the gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007